There are numerous methods to understand a Convolutional Neural Network by visualizing it. But, most repositories/ libraries that implement these techniques only work for specific models such as a VGG/ Alexnet pretrained on ImageNet. So, I created timm-vis, a library that provides a variety of visualization methods that work on any model trained on any dataset. The only requirement is that the model is an image classifier built with PyTorch. The 8 visualization techniques are described in detail below. If you would like to try out these visualization methods, you can start with the details.ipynb in the repository.

from timm_vis.methods import *
import timm

Throughout the notebook, an EfficientNet B0 pretrained on ImageNet and an image of a dog ("chow chow" - class 260) will be used.

model = timm.create_model(model_name = 'efficientnet_b0', pretrained = True)
img = Image.open('chow.jpg').resize((512, 512))
img
Downloading: "https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b0_ra-3dd342df.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_ra-3dd342df.pth

1. Visualize filters

The visualize_filters function plots the filters of a convolutional layer by interpreting each channel as a grayscale image.

Parameters:

  • model: PyTorch image classifier
  • filter_name: name of the layer whose filters are visualized, defaults to first layer
  • max_filters: maximum number of filters to be displayed, defaults to 64
  • size: size to which filters are upsized to, defaults to 128
  • figsize: size of the pyplot figure, defaults to (16, 16)
  • save_path: path where generated plot is saved, defaults to None

Below, 25 filters of the second convolutional layer (named 'blocks.0.0.conv_dw.weight') are plotted. The name of a layer can be found by iterating over model.named_parameters(). If the number of filters exceeds max_filters, max_filters random filters are plotted.

visualize_filters(model, 'blocks.0.0.conv_dw.weight', max_filters = 25)

2. Visualize activations

The visualize_activations function plots the activations of a specific layer given a specific image.

Parameters:

  • model: PyTorch image classifier
  • module: layer whose activations are recorded
  • img_path: path to image
  • max_acts: maximum number of activations to be displayed, defaults to 64
  • figsize: size of the pyplot figure, defaults to (16, 16)
  • save_path: path where generated plot is saved, defaults to None

The image at img_path is converted to a tensor and fed to the model. The outputs of module are stored and displayed. Each channel in the intermediate output tensor is interpreted as a grayscale image. Below are the activations of the model's first convolutional layer for the image of the dog.

visualize_activations(model, model.conv_stem, 'chow.jpg')

3. Maximally Activated Patches

The maximally_activated_patches function plots the patches of the image that produce the maximal activations at the last layer while performing a forward pass with that image.

Parameters:

  • model: PyTorch image classifier
  • img_path: path to image
  • patch_size: size of patch, defaults to 448
  • stride: stride of the sliding patches, defaults to 100
  • num_patches: number of patches
  • figsize: size of the pyplot figure, defaults to (16, 16)
  • device: device to use while computing patches, defaults to cuda
  • save_path: path where generated plot is saved, defaults to None

To find the maximally activated patches, parts of the image (patches) are occluded. The occlusions that produce the largest change in the predicted scores for the top class are ranked higher than those that produce minimal changes. The top num_patches patches of the image are plotted. Below are the top 5 patches of the dog image.

maximally_activated_patches(model, 'chow.jpg')

4. Saliency map

The saliency_map function plots the gradient of the predicted score with respect to each pixel in the input image.

Parameters:

  • model: PyTorch image classifier
  • img_path: path to image
  • figsize: size of the pyplot figure, defaults to (16, 16)
  • device: device to use while computing patches, defaults to cuda
  • save_path: path where generated plot is saved, defaults to None

The gradient of the un-normalized class score is calculated with respect to the pixels in the image. The absolute value and maximum is taken over all three channels. Lighter parts of the map correspond with gradients of higher magnitude. Below is the saliency map of the dog image.

saliency_map(model, 'chow.jpg')

5. Generate synthetic image

The generate_image function generates a synthetic image that maximizes the score of a specific class.

Parameters:

  • model: PyTorch image classifier
  • target_class: the class whose score is maximized
  • epochs: number of epochs to execute gradient ascent for
  • min_prob: minimum probability of the target class, gradient ascent is interrupted if the confidence score for target class is > min_prob
  • lr: learning rate
  • weight_decay: weight decay used for L2 regularization
  • step_size: step size for learning rate scheduler, defaults to 100
  • gamma: gamma used for learning rate scheduler, defaults to 0.6
  • noise_size: size of initial noise, defaults to 224
  • p_freq: printing frequency, defaults to 50
  • init: function used when initializing noise, defaults to torch.randn
  • model: PyTorch image classifier
  • device: device to use while computing patches, defaults to cuda
  • figsize: size of the pyplot figure, defaults to (6, 6)
  • save_path: path where generated plot is saved, defaults to None

The input to the model is initialized using the init function. A forward pass is performed in order to compute gradients of the target class with respect to the input. The input is changed in order to maximize the score of the target class. This process is repeated for epochs iterations or until the model predicts the target class with minimum probability min_prob. The function call below generates a synthetic image for which the model predicts class 130 (flamingo) with a confidence of ~0.91.

synthetic_image = generate_image(model = model, target_class = 130, epochs = 500, min_prob = 0.9, lr = 10, weight_decay = 5e-2, 
                        step_size = 100, gamma = 0.9)
Epoch: 50 Confidence score for class 130: 0.04519936442375183
Epoch: 100 Confidence score for class 130: 0.19548705220222473
Epoch: 150 Confidence score for class 130: 0.2422645539045334
Reached 0.9113105535507202 confidence score in epoch 187. Stopping early.

6. Fool model

The fool_model function modifies an input image such that the model's score for a target class is maximized.

Parameters:

  • model: PyTorch image classifier
  • img_path: path to image
  • target_class: the class whose score is maximized
  • epochs: number of epochs to execute gradient ascent for
  • min_prob: minimum probability of the target class, gradient ascent is interrupted if the confidence score for target class is > min_prob
  • lr: learning rate
  • step_size: step size for learning rate scheduler, defaults to 100
  • gamma: gamma used for learning rate scheduler, defaults to 0.6
  • p_freq: printing frequency, defaults to 50
  • init: function used when initializing noise, defaults to torch.randn
  • model: PyTorch image classifier
  • device: device to use while computing patches, defaults to cuda
  • figsize: size of the pyplot figure, defaults to (6, 6)
  • save_path: path where generated plot is saved, defaults to None

The input image is modified through the same procedure used in the generate_image function. The only difference between the two methods is that this method inputs the image at img_path to the model instead of a random tensor. The function call below modifies the image of a chow chow only slightly. But, the model predicts that the modified image is of class 724 (pirate ship) with a high confidence.

adv = fool_model(model = model, img_path = 'chow.jpg', target_class = 724, epochs = 500, 
                       min_prob = 0.9, lr = 5e-1, step_size = 100, gamma = 0.8)
Reached 0.9150669574737549 confidence score in epoch 38. Stopping early.

Confirming that the model does predict the above image as class 724 (pirate ship):

model(adv).argmax()
tensor(724, device='cuda:0')

7. Feature inversion

The feature_inversion function reconstructs an input image using intermediate feature representation of multiple modules.

Parameters:

  • model: PyTorch image classifier
  • modules: list of modules
  • img_path: path to image
  • epochs: number of epochs to execute gradient ascent for
  • lr: learning rate
  • step_size: step size for learning rate scheduler, defaults to 100
  • gamma: gamma used for learning rate scheduler, defaults to 0.6
  • mu: regularization factor for total variation regularizer
  • device: device to use while computing patches, defaults to cuda
  • figsize: size of the pyplot figure, defaults to (6, 6)
  • save_path: path where generated plot is saved, defaults to None

The feature vector of the input image from a module is recorded. Another image is generated that minimizes the sum of the distance between the feature vector of the recreated image and the feature vector of the original input image and the total variation regularizer (required for the image to look "natural"). Below are the outputs of the function when an image is reconstructured using the outputs of the first, second and last convolutional layer of the model. As seen below, earlier layers of the model tend to recreate images that closely resemble the input image. This shows that as the image is passes through the model, information is lost.

modules = [model.conv_stem, model.blocks[0][0].conv_dw, model.blocks[-1][0].conv_pwl]
feature_inversion(model, modules, 'chow.jpg', 100, 1e-3)

8. Deep Dream

The deep_dream function modifies an input image in order to maximize the activations of an intermediate layer.

Parameters:

  • model: PyTorch image classifier
  • module: module whose outputs are maximized
  • img_path: path to image
  • epochs: number of epochs to execute gradient ascent for
  • lr: learning rate
  • step_size: step size for learning rate scheduler, defaults to 100
  • gamma: gamma used for learning rate scheduler, defaults to 0.6
  • device: device to use while computing patches, defaults to cuda
  • figsize: size of the pyplot figure, defaults to (12, 12)
  • save_path: path where generated plot is saved, defaults to None

A given input image is modified in order to maximize the outputs of module. This is a very simplistic implementation of Deep Dream. For the same input image, the outputs of this function may vary depending on the model weights.

dream = deep_dream(model = model, module = model.blocks[-2][0].conv_dw
                             , img_path = 'chow.jpg', epochs = 100, lr = 2)

The above functions will work for any PyTorch image classifier. However, you may have to use different hyperparameters for various models and functions. If you notice any bugs or missing citations or have any feedback, code optimizations or feature requests, please inform me through GitHub.